然而,他们的性能在火车时间存在嘈杂的标签存在下降。灵感来自于使用专家建议的学习,其中乘法权重(MW)更新最近被证明是在专家建议中适度的数据损坏的强大,我们建议在神经网络优化期间使用MW进行重新免除示例。我们理论上建立了当与梯度下降一起使用时的方法的收敛性,并证明其在1D案例中的标签噪声的优势。然后,我们通过表明MW在CIFAR-10,CIFAR-100和服装1M上的标签噪声存在下提高神经网络精度来验证我们的调查结果。我们还展示了我们对对抗性鲁棒性的影响。
translated by 谷歌翻译
While recent advancements in artificial intelligence (AI) language models demonstrate cutting-edge performance when working with English texts, equivalent models do not exist in other languages or do not reach the same performance level. This undesired effect of AI advancements increases the gap between access to new technology from different populations across the world. This unsought bias mainly discriminates against individuals whose English skills are less developed, e.g., non-English speakers children. Following significant advancements in AI research in recent years, OpenAI has recently presented DALL-E: a powerful tool for creating images based on English text prompts. While DALL-E is a promising tool for many applications, its decreased performance when given input in a different language, limits its audience and deepens the gap between populations. An additional limitation of the current DALL-E model is that it only allows for the creation of a few images in response to a given input prompt, rather than a series of consecutive coherent frames that tell a story or describe a process that changes over time. Here, we present an easy-to-use automatic DALL-E storytelling framework that leverages the existing DALL-E model to enable fast and coherent visualizations of non-English songs and stories, pushing the limit of the one-step-at-a-time option DALL-E currently offers. We show that our framework is able to effectively visualize stories from non-English texts and portray the changes in the plot over time. It is also able to create a narrative and maintain interpretable changes in the description across frames. Additionally, our framework offers users the ability to specify constraints on the story elements, such as a specific location or context, and to maintain a consistent style throughout the visualization.
translated by 谷歌翻译
Recent work attributes progress in NLP to large language models (LMs) with increased model size and large quantities of pretraining data. Despite this, current state-of-the-art LMs for Hebrew are both under-parameterized and under-trained compared to LMs in other languages. Additionally, previous work on pretrained Hebrew LMs focused on encoder-only models. While the encoder-only architecture is beneficial for classification tasks, it does not cater well for sub-word prediction tasks, such as Named Entity Recognition, when considering the morphologically rich nature of Hebrew. In this paper we argue that sequence-to-sequence generative architectures are more suitable for LLMs in the case of morphologically rich languages (MRLs) such as Hebrew. We demonstrate that by casting tasks in the Hebrew NLP pipeline as text-to-text tasks, we can leverage powerful multilingual, pretrained sequence-to-sequence models as mT5, eliminating the need for a specialized, morpheme-based, separately fine-tuned decoder. Using this approach, our experiments show substantial improvements over previously published results on existing Hebrew NLP benchmarks. These results suggest that multilingual sequence-to-sequence models present a promising building block for NLP for MRLs.
translated by 谷歌翻译
这项研究提出了一个基于移动网格参数化的端到端无监督的差异可变形登记框架。使用此参数化,可以使用其转换雅各布的决定因素和末端速度场的卷曲来建模。变形场的新模型具有三个重要优势。首先,它放松了对成本函数的显式正则化项和相应重量的需求。平滑度隐含在溶液中,从而导致物理上合理的变形场。其次,它通过适用于转换雅各布决定因素的明确约束来保证差异性。最后,它适用于心脏数据处理,因为该参数化的性质是根据​​径向和旋转成分定义变形场。通过在包括2D和3D心脏MRI扫描在内的三个不同数据集上评估拟议方法来研究算法的有效性。结果表明,所提出的框架在生成差异变换的同时优于现有的基于学习的方法和基于非学习的方法。
translated by 谷歌翻译
新兴的沟通研究通常着重于优化特定于任务的效用作为沟通的驱动力。但是,通过优化信息和复杂性之间的信息瓶颈权衡,人类语言似乎在压力下发展,以有效地将含义压缩到通信信号中。在这项工作中,我们研究了如何交换这三个因素 - 效用,信息性和复杂性 - 与人类交流相比,包括新兴的沟通。为此,我们提出了矢量定量的变分信息瓶颈(VQ-VIB),这是一种训练神经剂将输入压缩到嵌入连续空间中的离散信号的方法。我们通过VQ-VIB训练代理商,并将其性能与以前建议的神经体系结构在接地环境和刘易斯参考游戏中进行比较。在所有神经体系结构和环境中,考虑到沟通信息有益的沟通融合率,并惩罚交流复杂性会导致类似人类的词典大小,同时保持高效用。此外,我们发现VQ-VIB优于其他离散通信方法。这项工作表明,人们认为人类语言进化的基本原理如何为人工代理中的新兴沟通提供信息。
translated by 谷歌翻译
谷歌的运营洪水预测系统是制定的,为机构和公众提供准确的实时洪水警告,重点是河流洪水在大型潮流的河流中。它在2018年开始运作,自从地理位置扩展以来。该预测系统由四个子系统组成:数据验证,阶段预测,淹没建模和警报分配。机器学习用于两个子系统。阶段预测采用长短期内存(LSTM)网络和线性模型进行建模。使用阈值和歧管模型计算洪水淹没,前者计算淹没程度,后者计算淹没程度和深度。本文首次提供的歧管模型提供了一种机器学习替代洪水淹没的液压建模。在评估历史数据时,所有型号都可以实现可操作使用的足够高的度量指标。 LSTM表现出比线性模型更高的技能,而阈值和歧管模型达到了类似的性能度量,以便在淹没程度上进行建模。在2021年的季风季节期间,洪水预警系统在印度和孟加拉国运营,覆盖河流的洪水区,总面积287,000平方公里,拥有350多万人。超过100米的洪水警报被发送给受影响的人口,相关当局以及紧急组织。系统上的当前和未来的工作包括将覆盖范围扩展到额外的洪水易发位置,以及提高建模能力和准确性。
translated by 谷歌翻译
Boosting是一种著名的机器学习方法,它基于将弱和适度不准确假设与强烈而准确的假设相结合的想法。我们研究了弱假设属于界限能力类别的假设。这个假设的灵感来自共同的惯例,即虚弱的假设是“易于学习的类别”中的“人数规则”。 (Schapire和Freund〜 '12,Shalev-Shwartz和Ben-David '14。)正式,我们假设弱假设类别具有有界的VC维度。我们关注两个主要问题:(i)甲骨文的复杂性:产生准确的假设需要多少个弱假设?我们设计了一种新颖的增强算法,并证明它绕过了由Freund和Schapire('95,'12)的经典下限。虽然下限显示$ \ omega({1}/{\ gamma^2})$弱假设有时是必要的,而有时则需要使用$ \ gamma $ -margin,但我们的新方法仅需要$ \ tilde {o}({1})({1}) /{\ gamma})$弱假设,前提是它们属于一类有界的VC维度。与以前的增强算法以多数票汇总了弱假设的算法不同,新的增强算法使用了更复杂(“更深”)的聚合规则。我们通过表明复杂的聚合规则实际上是规避上述下限是必要的,从而补充了这一结果。 (ii)表现力:通过提高有限的VC类的弱假设可以学习哪些任务?可以学到“遥远”的复杂概念吗?为了回答第一个问题,我们{介绍组合几何参数,这些参数捕获增强的表现力。}作为推论,我们为认真的班级的第二个问题提供了肯定的答案,包括半空间和决策树桩。一路上,我们建立并利用差异理论的联系。
translated by 谷歌翻译
Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as "the Bar Exam," as a precondition for law practice. To even sit for the exam, most jurisdictions require that an applicant completes at least seven years of post-secondary education, including three years at an accredited law school. In addition, most test-takers also undergo weeks to months of further, exam-specific preparation. Despite this significant investment of time and capital, approximately one in five test-takers still score under the rate required to pass the exam on their first try. In the face of a complex task that requires such depth of knowledge, what, then, should we expect of the state of the art in "AI?" In this research, we document our experimental evaluation of the performance of OpenAI's `text-davinci-003` model, often-referred to as GPT-3.5, on the multistate multiple choice (MBE) section of the exam. While we find no benefit in fine-tuning over GPT-3.5's zero-shot performance at the scale of our training data, we do find that hyperparameter optimization and prompt engineering positively impacted GPT-3.5's zero-shot performance. For best prompt and parameters, GPT-3.5 achieves a headline correct rate of 50.3% on a complete NCBE MBE practice exam, significantly in excess of the 25% baseline guessing rate, and performs at a passing rate for both Evidence and Torts. GPT-3.5's ranking of responses is also highly-correlated with correctness; its top two and top three choices are correct 71% and 88% of the time, respectively, indicating very strong non-entailment performance. While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future.
translated by 谷歌翻译